ONT 16S Sequencing Files

Raw fast5 files from each Oxford Nanopore (ONT) sequencing batch can be found within the directory named: raw_fast5_files/

A total of 8 sequencing batches were run. Each batch includes 24 samples from the following treatment and donor ids:
      1. UT1, UT2, UT3
      2. UT4, UT5, UT6
      3. PLA1, PLA2, PLA3
      4. PLA4, PLA5, PLA6 *NOTE: These samples had to be re-run so they represent two separate sequencing runs
      5. T1-1, T1-2, T1-3
      6. T1-4, T1-5, T1-6
      7. T2-1, T2-2, T2-3
      8. T2-4, T2-5, T2-6
So for example, the raw fast5 files of samples within UT1, UT2, UT3 would be in: raw_fast5_files/UT1_UT2_UT3_fast5s/

ONT 16S Sequence QC Pipeline

After ONT sequencing the following pipeline was used to demultiplex and quality filter reads:
1. Basecall with Dorado V 0.4.1 https://github.com/nanoporetech/dorado
3. Read quality filtering and trimming with Chopper V 0.6.0 https://github.com/wdecoster/chopper
   Trimmed fastq files can be found in the directory named: trimmed_fastqs
      Chopper parameters used:
          • Min q score: 10
          • Min length: 1,000 bp
          • Head crop: 50
          • Max length: 1690
4. Multiqc results summary using fastp qc V 0.20.1
   Multiqc results for each sequencing batch can be found in the directory named: multiqc
Full pipeline with code is available at: https://github.com/MessyaszA/ONT_demux

Taxonomic Classification

Quality filtered and trimmed reads were then aligned to the HOMD database and biom files were created for import into phyloseq.

o First, built a Kraken2 database of the 16S HOMD sequences (https://www.homd.org/ftp/16S_rRNA_refseq/HOMD_16S_rRNA_RefSeq/current/HOMD_16S_rRNA_RefSeq_V15.23.fasta)

1. Taxonomically classified reads via Kraken2 to the built HOMD 16S database
2. Estimated abundance of taxa via Bracken https://ccb.jhu.edu/software/bracken/
      Krona graphs of the Bracken abundance estimates for each sample can be found in the directory named: krona_bracken
3. Exported Bracken results as a biom file for import into phlyoseq (via taxpasta V 0.6.1 https://taxpasta.readthedocs.io/en/latest/)
Full pipeline with code for these 3 steps is available at: https://github.com/MessyaszA/bracken_taxpasta

Statistical Analysis

Diversity, differential abundance, and relative abundance of taxa were analyzed via Phyloseq and statistical tests were run between the 4 treatments.

The biom file and metadata table imported into phyloseq can be found in the directory named: stats_analysis

Results include:

o Alpha diversity metrics - Observed, Shannon, Simpson (csv tables in stats_analysis directory)
o Alpha diversity statistical tests (results below)
o Alpha diversity boxplots (tiff images in stats_analysis/alpha_diversity_plots directory)
      • *NOTE: Observed richness was run on raw and normalized data. Rarefaction to lowest read number was used as normalization.
      This was included for comparative purposes. Other normalization methods can also be run if requested.
o Pairwise differential abundance test via DeSeq2 (csv table results in stats_analysis/DESEQ2 directory)
      • Volcano plot (tiff image in stats_analysis/DESEQ2 directory)
      • *NOTE: Another differential abundance method for microbiome data (ANCOM-BC
      https://www.bioconductor.org/packages/release/bioc/html/ANCOMBC.html ) can be run if requested.
o Beta diversity statistical tests - Bray-Curtis, Euclidean distances
      • *NOTE: Methods for normalizing beta diversity metrics (i.e. calculating Aitchison distance via the deicode package in Qiime2
      https://library.qiime2.org/plugins/deicode/19/) can be run if requested.
o Beta diversity plots (tiff images in stats_analysis/beta_diversity_plots directory)
o Relative abundance taxa plots (tiff images can be found in stats_analysis/taxaplots directory)

#Libraries we will use
library("DESeq2")
library("ggplot2")
library(Biostrings)
library("phyloseq")
library("microbiome")
library("ggthemes")
library(ggnetwork)
library(cowplot)
library("vegan")
library(magrittr)
library(dplyr)
library(EnhancedVolcano)

#Create a few output directories that will hold the output files. 
dir.create('./DESEQ2')
dir.create('./ANCOM')
# Load data and add metadata
ps_object <- import_biom("./HOMD_biom_merge.biom")

#import medata and add to phyloseq object
metadata <- read.csv("./metadata.csv", row.names = 1)
metadata = sample_data(data.frame(metadata))
sample_data(ps_object) <- metadata

Statistical Analysis Results

Total number of taxa

The total number of taxonomically identified ASVs across all samples

#Check sample numbers
ntaxa(ps_object) #Total taxa
[1] 535

Number of taxonomically classified reads per sample

Samples will have varying numbers of taxonomically classified reads due to different sampling depths during sequencing.

#Check sample numbers
sample_sums(ps_object) #Reads per sample
PLA-6G.trim.fastq_bracken PLA-5F.trim.fastq_bracken PLA-6D.trim.fastq_bracken PLA-6E.trim.fastq_bracken 
                   122253                    134537                    151338                    160823 
PLA-4G.trim.fastq_bracken  T1-6D.trim.fastq_bracken PLA-5H.trim.fastq_bracken PLA-6F.trim.fastq_bracken 
                   134896                    252097                    164959                    111641 
 T1-6E.trim.fastq_bracken PLA-4F.trim.fastq_bracken  T1-5E.trim.fastq_bracken PLA-4D.trim.fastq_bracken 
                   218460                    261085                    327649                    193079 
 T1-5F.trim.fastq_bracken  T1-5D.trim.fastq_bracken  T1-4A.trim.fastq_bracken PLA-5A.trim.fastq_bracken 
                   269287                    296705                    332687                    141418 
PLA-5G.trim.fastq_bracken  T1-6H.trim.fastq_bracken PLA-5B.trim.fastq_bracken  T1-5G.trim.fastq_bracken 
                   244555                    219936                    271721                    281291 
PLA-6B.trim.fastq_bracken  T1-6G.trim.fastq_bracken PLA-4C.trim.fastq_bracken PLA-4A.trim.fastq_bracken 
                   196991                    217395                    308241                    314998 
 T1-4H.trim.fastq_bracken  T1-6F.trim.fastq_bracken  T1-6A.trim.fastq_bracken  T1-4B.trim.fastq_bracken 
                   276626                    205810                    229874                    340421 
 T1-5H.trim.fastq_bracken PLA-4E.trim.fastq_bracken  T1-6C.trim.fastq_bracken  T1-5C.trim.fastq_bracken 
                   307412                    185855                    202763                    291410 
PLA-6H.trim.fastq_bracken  T2-4A.trim.fastq_bracken  T2-6F.trim.fastq_bracken  T2-5D.trim.fastq_bracken 
                   124695                    177019                    121162                    149516 
PLA-5C.trim.fastq_bracken  T1-4G.trim.fastq_bracken  T2-4C.trim.fastq_bracken  T2-4B.trim.fastq_bracken 
                   115282                    229188                    175811                    166040 
 T2-5A.trim.fastq_bracken PLA-6C.trim.fastq_bracken PLA-5D.trim.fastq_bracken PLA-5E.trim.fastq_bracken 
                   143238                    149583                    148025                    147511 
 T2-4G.trim.fastq_bracken PLA-4B.trim.fastq_bracken  T1-6B.trim.fastq_bracken  T2-5E.trim.fastq_bracken 
                   122305                    307852                    233657                    230137 
 T2-5C.trim.fastq_bracken  T2-4F.trim.fastq_bracken  T2-6H.trim.fastq_bracken  T2-6E.trim.fastq_bracken 
                   135595                    136812                    122927                    114585 
 T2-6G.trim.fastq_bracken PLA-4H.trim.fastq_bracken  T2-5F.trim.fastq_bracken  T2-6C.trim.fastq_bracken 
                   111199                    232902                    117429                     99830 
 T2-6A.trim.fastq_bracken  T2-4H.trim.fastq_bracken  T2-5G.trim.fastq_bracken  T1-1E.trim.fastq_bracken 
                   134098                    139078                    141449                    129593 
 T1-4F.trim.fastq_bracken  T2-4E.trim.fastq_bracken  T2-6D.trim.fastq_bracken  T2-5B.trim.fastq_bracken 
                   291101                    158881                    111618                    156670 
 T2-5H.trim.fastq_bracken  T1-4E.trim.fastq_bracken  T1-5A.trim.fastq_bracken  T1-5B.trim.fastq_bracken 
                   146811                    320326                    295293                    335887 
 T1-2C.trim.fastq_bracken PLA-2A.trim.fastq_bracken  T1-1H.trim.fastq_bracken  T1-2F.trim.fastq_bracken 
                   112517                    129688                    112481                    111566 
 T1-3D.trim.fastq_bracken  T1-1A.trim.fastq_bracken  T1-1D.trim.fastq_bracken  T1-1B.trim.fastq_bracken 
                   137279                    156574                    156381                    142894 
 T1-3B.trim.fastq_bracken  T1-3E.trim.fastq_bracken  T1-3F.trim.fastq_bracken  T1-2E.trim.fastq_bracken 
                   143554                    136304                    126753                    137682 
 T1-2G.trim.fastq_bracken  T1-2D.trim.fastq_bracken  T2-6B.trim.fastq_bracken PLA-6A.trim.fastq_bracken 
                   138687                    135530                    107692                    221393 
 T2-4D.trim.fastq_bracken  T1-4D.trim.fastq_bracken  T1-4C.trim.fastq_bracken  T1-1G.trim.fastq_bracken 
                   186787                    516674                    347729                     97588 
 T1-2H.trim.fastq_bracken  T1-1F.trim.fastq_bracken  T1-2B.trim.fastq_bracken  T1-1C.trim.fastq_bracken 
                   132909                    100480                    169725                    150087 
 T1-3G.trim.fastq_bracken  T1-3H.trim.fastq_bracken  T1-2A.trim.fastq_bracken  T1-3C.trim.fastq_bracken 
                   132758                    137643                    121393                    134268 
PLA-1H.trim.fastq_bracken  T1-3A.trim.fastq_bracken PLA-1A.trim.fastq_bracken PLA-1C.trim.fastq_bracken 
                   113231                    164631                    150169                    136540 
PLA-3D.trim.fastq_bracken PLA-3C.trim.fastq_bracken PLA-3A.trim.fastq_bracken PLA-3E.trim.fastq_bracken 
                   128055                    127764                    152664                    137933 
PLA-3H.trim.fastq_bracken PLA-1G.trim.fastq_bracken  UT-5C.trim.fastq_bracken PLA-2F.trim.fastq_bracken 
                   139049                    113533                    142315                    116570 
PLA-2D.trim.fastq_bracken PLA-2C.trim.fastq_bracken PLA-3B.trim.fastq_bracken PLA-2E.trim.fastq_bracken 
                   119736                    124253                    121101                    141694 
PLA-1E.trim.fastq_bracken PLA-2H.trim.fastq_bracken PLA-3G.trim.fastq_bracken PLA-1B.trim.fastq_bracken 
                   139254                    124653                    134659                    139846 
PLA-2G.trim.fastq_bracken PLA-3F.trim.fastq_bracken  UT-6B.trim.fastq_bracken  UT-6G.trim.fastq_bracken 
                   131090                    124131                    149386                    157610 
 UT-4B.trim.fastq_bracken  UT-5E.trim.fastq_bracken PLA-2B.trim.fastq_bracken  UT-4A.trim.fastq_bracken 
                   188401                    204864                    147226                    232458 
 UT-4D.trim.fastq_bracken PLA-1D.trim.fastq_bracken  UT-4G.trim.fastq_bracken PLA-1F.trim.fastq_bracken 
                   230691                    170659                    146773                    114481 
 UT-4E.trim.fastq_bracken  UT-6E.trim.fastq_bracken  UT-6F.trim.fastq_bracken  UT-5D.trim.fastq_bracken 
                   180292                    160952                    150999                    161566 
 UT-5F.trim.fastq_bracken  UT-5B.trim.fastq_bracken  UT-6C.trim.fastq_bracken  UT-4H.trim.fastq_bracken 
                   162768                    194116                    137967                    148853 
 UT-5G.trim.fastq_bracken  UT-5A.trim.fastq_bracken  UT-4C.trim.fastq_bracken  UT-4F.trim.fastq_bracken 
                   155111                    166610                    210096                    174867 
 UT-5H.trim.fastq_bracken  UT-6H.trim.fastq_bracken  T2-2C.trim.fastq_bracken  T2-2H.trim.fastq_bracken 
                   171615                    155942                    224131                    236759 
 T2-1F.trim.fastq_bracken  T2-3D.trim.fastq_bracken  UT-6A.trim.fastq_bracken  UT-6D.trim.fastq_bracken 
                   220654                    226536                    150983                    154152 
 T2-2D.trim.fastq_bracken  T2-1G.trim.fastq_bracken  T2-1D.trim.fastq_bracken  T2-3G.trim.fastq_bracken 
                   212920                    196370                    254486                    241081 
 T2-3C.trim.fastq_bracken  T2-1C.trim.fastq_bracken  T2-2B.trim.fastq_bracken  T2-1B.trim.fastq_bracken 
                   244282                    288104                    247158                    265356 
 T2-3A.trim.fastq_bracken  T2-3F.trim.fastq_bracken  T2-2E.trim.fastq_bracken   UT3D.trim.fastq_bracken 
                   254775                    249430                    228892                    193130 
 T2-3B.trim.fastq_bracken  T2-2G.trim.fastq_bracken  T2-3E.trim.fastq_bracken  T2-1A.trim.fastq_bracken 
                   258304                    236985                    230370                    239366 
 T2-2F.trim.fastq_bracken  T2-1E.trim.fastq_bracken  T2-1H.trim.fastq_bracken  T2-2A.trim.fastq_bracken 
                   197203                    251642                    212102                    219040 
  UT3B.trim.fastq_bracken   UT2C.trim.fastq_bracken   UT2B.trim.fastq_bracken   UT1E.trim.fastq_bracken 
                   303307                    119023                     90053                     95039 
  UT1H.trim.fastq_bracken   UT2A.trim.fastq_bracken  T2-3H.trim.fastq_bracken   UT1F.trim.fastq_bracken 
                    85347                     94188                    246493                    103031 
  UT1G.trim.fastq_bracken   UT1B.trim.fastq_bracken   UT2H.trim.fastq_bracken   UT1A.trim.fastq_bracken 
                    90941                    251468                    131175                    215807 
  UT2E.trim.fastq_bracken   UT3C.trim.fastq_bracken   UT2G.trim.fastq_bracken   UT3G.trim.fastq_bracken 
                   174982                    159762                    230766                    293498 
  UT1D.trim.fastq_bracken   UT2F.trim.fastq_bracken   UT3H.trim.fastq_bracken   UT3E.trim.fastq_bracken 
                   146254                    257994                    263981                    195541 
  UT2D.trim.fastq_bracken   UT3F.trim.fastq_bracken   UT3A.trim.fastq_bracken   UT1C.trim.fastq_bracken 
                   255864                    276688                    284484                    290816 

Alpha Diversity

Measuring species diversity within each sample.

#Function to calculate the mean and the standard deviation for each group.
#data : a data frame
#varname : the name of a column containing the variable to be summarized
#groupnames : vector of column names to be used as grouping variables

# Calculate standard error
sderr <- function(x) {sd(x)/sqrt(length(x))}

data_summary <- function(data, varname, groupnames){
  require(plyr)
  summary_func <- function(x, col){
    c(mean = mean(x[[col]], na.rm=TRUE),
      sd = sderr(x[[col]]), na.rm=TRUE)
  }
  data_sum<-ddply(data, groupnames, .fun=summary_func,
                  varname)
  data_sum <- rename(data_sum, c("mean" = varname))
  return(data_sum)
}

Data table with alpha diversity metrics: Observed, Shannon, Simpson

Example of data table structure (full table saved as csv)

Three alpha diversity metrics are calculated for each sample. The full table (per_sample_ad_dataframe.csv) can be found in the directory named: stats_analysis

alpha_div <- cbind(estimate_richness(ps_object, 
                                                   measures = c('Observed', 'Shannon', 'Simpson')),
                                 sample_data(ps_object))
colnames(alpha_div) <- c('Observed', 'Shannon', 'Simpson')
alpha_div$Labels <- rownames(alpha_div)
ad_df <- alpha_div[,c('Observed', 'Shannon', 'Simpson')]
ad_df <- cbind(ad_df,
                              sample_data(ps_object)[, c('sample', 'group', 'donor', 'treatment')])
colnames(ad_df) <- c('Observed', 'Shannon', 'Simpson', 'sample', 'group', 'donor', 'treatment')
head(ad_df)
write.csv(ad_df, "./per_sample_ad_dataframe.csv")

Data Summary statistics: Observed, Shannon, Simpson

Summary for each treatment

The average of each alpha diversity metric for all samples within a treatment.

# Observed
data_summary(ad_df, varname = "Observed", groupnames = c("treatment"))
#data_summary(ad_df, varname = "Observed", groupnames = c("sample"))
#data_summary(ad_df, varname = "Observed", groupnames = c("donor"))

# Shannon
data_summary(ad_df, varname = "Shannon", groupnames = c("treatment"))
#data_summary(ad_df, varname = "Shannon", groupnames = c("sample"))
#data_summary(ad_df, varname = "Shannon", groupnames = c("donor"))

# Simpson
data_summary(ad_df, varname = "Simpson", groupnames = c("treatment"))
#data_summary(ad_df, varname = "Simpson", groupnames = c("sample"))
#data_summary(ad_df, varname = "Simpson", groupnames = c("donor"))

Alpha diversity statistical tests for significance

Kruskal Wallis and Pairwise Wilcoxon tests

Kruskal-Wallis Rank Sum Test for a category (treatment) on each Alpha Diversity metric calculated. This is a non-parametric method which ranks the data and tests whether samples from each group analyzed come from the same distribution.

Pairwise Wilcoxon Rank Sum Test between groups within a category (treatment) on each Alpha Diversity metric calculated and corrected for multiple testing. This methods compares two independent samples for all group levels and includes the false discovery rate (FDR) multiple test correction. The results below come up as a matrix filled with the p-values for each treatment comparison.

Observed Richness

Testing ASV/species richness between the 4 treatments.

#### Observed
#Kruskal Wallis tests
kruskal.test(Observed ~ treatment, data = ad_df)

    Kruskal-Wallis rank sum test

data:  Observed by treatment
Kruskal-Wallis chi-squared = 11.944, df = 3, p-value = 0.007577
#kruskal.test(Observed ~ sample, data = ad_df)
#kruskal.test(Observed ~ donor, data = ad_df)

#Pairwise Wilcoxon Test
pairwise.wilcox.test(ad_df$Observed, ad_df$treatment, p.adjust.method = 'fdr')

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  ad_df$Observed and ad_df$treatment 

          1.5R  8R    Placebo
8R        0.849 -     -      
Placebo   0.179 0.257 -      
Untreated 0.179 0.011 0.011  

P value adjustment method: fdr 

Shannon Diversity

Testing ASV/species richness between the 4 treatments while also taking into account relative abundance (evenness).

#### Shannon
#Kruskal Wallis tests
kruskal.test(Shannon ~ treatment, data = ad_df)

    Kruskal-Wallis rank sum test

data:  Shannon by treatment
Kruskal-Wallis chi-squared = 6.9716, df = 3, p-value = 0.07281
#kruskal.test(Shannon ~ sample, data = ad_df)
#kruskal.test(Shannon ~ donor, data = ad_df)

#Pairwise Wilcoxon Test
pairwise.wilcox.test(ad_df$Shannon, ad_df$treatment, p.adjust.method = 'fdr')

    Pairwise comparisons using Wilcoxon rank sum exact test 

data:  ad_df$Shannon and ad_df$treatment 

          1.5R 8R   Placebo
8R        0.73 -    -      
Placebo   0.92 0.64 -      
Untreated 0.11 0.17 0.10   

P value adjustment method: fdr 

Simpson Diversity

Testing ASV/species richness between the 4 treatments while also taking into account relative abundance (evenness). Simpson diversity gives more weight to common or dominant ASVs (rare ASVs will not affect diversity).

#### Simpson
#Kruskal Wallis tests
kruskal.test(Simpson ~ treatment, data = ad_df)

    Kruskal-Wallis rank sum test

data:  Simpson by treatment
Kruskal-Wallis chi-squared = 3.2214, df = 3, p-value = 0.3587
#kruskal.test(Simpson ~ sample, data = ad_df)
#kruskal.test(Simpson ~ donor, data = ad_df)

#Pairwise Wilcoxon Test
pairwise.wilcox.test(ad_df$Simpson, ad_df$treatment, p.adjust.method = 'fdr')

    Pairwise comparisons using Wilcoxon rank sum exact test 

data:  ad_df$Simpson and ad_df$treatment 

          1.5R 8R   Placebo
8R        0.60 -    -      
Placebo   0.80 0.60 -      
Untreated 0.60 0.60 0.48   

P value adjustment method: fdr 

Alpha diversity boxplots

Images of each plot can be found in: stats_analysis/alpha_diversity_plots

###Group plots
#Observed
obs_ad_plot <- ggplot(ad_df, aes(x=treatment, y=Observed, color = treatment)) + geom_boxplot() + geom_point() + ylab('Observed Richness') + scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F"))

obs_ad_plot


ggsave("obs_ad_plot.tiff", dpi = 300, plot = obs_ad_plot, path = "./alpha_diversity_plots/")

#Shannon
shan_ad_plot <- ggplot(ad_df, aes(x=treatment, y=Shannon, color = treatment)) + geom_boxplot() + geom_point() + ylab('Shannon Diversity') + scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F"))

shan_ad_plot


ggsave("shan_ad_plot.tiff", dpi = 300, plot = shan_ad_plot, path = "./alpha_diversity_plots/")

#Simpson
simp_ad_plot <- ggplot(ad_df, aes(x=treatment, y=Simpson, color = treatment)) + geom_boxplot() + geom_point() + ylab('Simpson Diversity') + scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F"))

simp_ad_plot


ggsave("simp_ad_plot.tiff", dpi = 300, plot = simp_ad_plot, path = "./alpha_diversity_plots/")

Alpha Diversity on Rarefied Data

Normalize Observed Richness metric by rarefying to lowest sample sum number

Shannon and Simpson diversity indices are already normalized by relative abundance.

Rarefaction read depth

One method of normalization is to rarefy each sample to the minimum read depth found across all samples. The minimum read depth for all samples is:

min(sample_sums(ps_object)) #Minimum reads (used for rarefying)
[1] 85347

Example of datatable structure for rarefied data observed richness (full table saved as csv)

Observed richness is calculated for each sample now rarefied to the minimum read depth. The full table (rarefied_per_sample_ad_dataframe.csv) can be found in the directory named: stats_analysis

#Rarefy to lowest sample_sums
rare_num <- min(sample_sums(ps_object))
ps_rare <- rarefy_even_depth(ps_object, sample.size = rare_num, rngseed = 999)

#Get datatable with Observed richness for rarefied samples
alpha_div_rare <- cbind(estimate_richness(ps_rare, 
                                                   measures = c('Observed')),
                                 sample_data(ps_rare))
colnames(alpha_div_rare) <- c('Observed_rare')
alpha_div_rare$Labels <- rownames(alpha_div_rare)
ad_df_rare <- alpha_div_rare[,c('Observed_rare')]
ad_df_rare <- cbind(ad_df_rare,
                              sample_data(ps_rare)[, c('sample', 'group', 'donor', 'treatment')])
colnames(ad_df_rare) <- c('Observed_rare', 'sample', 'group', 'donor', 'treatment')
head(ad_df_rare)
write.csv(ad_df_rare, "./rarefied_per_sample_ad_dataframe.csv")

Data summary for rarefied data (observed richness)

The average observed richness for all samples within a treatment.

# Observed Rarefied data summary stats
data_summary(ad_df_rare, varname = "Observed_rare", groupnames = c("treatment"))
#data_summary(ad_df_rare, varname = "Observed_rare", groupnames = c("sample"))
#data_summary(ad_df_rare, varname = "Observed_rare", groupnames = c("donor"))

Observed richness statistical tests for rarefied data

Testing ASV/species richness between the 4 treatments.


#### Observed Rarefied
#Kruskal Wallis tests
kruskal.test(Observed_rare ~ treatment, data = ad_df_rare)

    Kruskal-Wallis rank sum test

data:  Observed_rare by treatment
Kruskal-Wallis chi-squared = 12.925, df = 3, p-value = 0.004801
#kruskal.test(Observed_rare ~ sample, data = ad_df_rare)
#kruskal.test(Observed_rare ~ donor, data = ad_df_rare)

#Pairwise Wilcoxon Test
pairwise.wilcox.test(ad_df_rare$Observed_rare, ad_df_rare$treatment, p.adjust.method = 'fdr')

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  ad_df_rare$Observed_rare and ad_df_rare$treatment 

          1.5R   8R     Placebo
8R        0.6733 -      -      
Placebo   0.3884 0.6733 -      
Untreated 0.0635 0.0066 0.0066 

P value adjustment method: fdr 

Rarefied data observed richness boxplot

Images of the plot can be found in: stats_analysis/alpha_diversity_plots

#Plots: Observed Rarefied
obs_ad_plot_rare <- ggplot(ad_df_rare, aes(x=treatment, y=Observed_rare, color = treatment)) + geom_boxplot() + geom_point() + ylab('Observed Richness Rarefied Data') + scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F"))

obs_ad_plot_rare


ggsave("rarefied_obs_ad_plot.tiff", dpi = 300, plot = obs_ad_plot_rare, path = "./alpha_diversity_plots/")

Differential Abundance Analysis

Pairwise differential abundance test between treatments via DESeq2

The package DESeq2 provides methods to test for differential abundance by use of negative binomial generalized linear models. DESeq2 uses the Wald test to take into account the dispersion model distance and negative binomial distribution we see in sequencing data. Overall this method statistically infers systematic changes between conditions, as compared to within-condition variability. It estimates dispersion and logarithmic fold changes to test which sequences/ASVs are significantly changing within the experimental design.

Additionally, a p-value correction is applied for multiple hypothesis testing. Traditionally p < 0.05 is considered significant (95% chance it is not by random chance), but when you are looking at thousands of ASVs/sequences, you’ll still end up with 5% of them significant off predicted chance. One of the best corrections to use called False-Discovery Rate correction (FDR), which is the Benjamini and Hochberg method.

Here we ran DESeq2 specifying only treatment in the experimental design. Log fold changes and adjusted p-valus (padj) for all ASVs and significant ASVs have been saved as csv tables in the stats_analysis/DESEQ2 directory (named dds_result_table.csv and sig_result_table.csv respectively).

The below plot shows the fitted curve (red line) used to calculate the final dispersion estimates (blue dots) of the input data (black dots).

#Phyloseq to Deseq2
ps_deseq = phyloseq_to_deseq2(ps_object, ~treatment)

#Run Deseq2
dds = DESeq(ps_deseq, test="Wald", fitType="local")
dds_result <- results(dds)
dds_result_table <- cbind(as(dds_result, "data.frame"), as(tax_table(ps_object)[rownames(dds_result), ], "matrix"))
write.csv(dds_result_table, "./DESEQ2/dds_result_table.csv")

#check significant results
dds_sig = dds_result[which(dds_result$padj < 0.05), ]
dds_sig_table = cbind(as(dds_sig, "data.frame"), as(tax_table(ps_object)[rownames(dds_sig), ], "matrix"))
dim(dds_sig_table)
write.csv(dds_sig_table, "./DESEQ2/sig_result_table.csv")

#Perform the variance stabilizing transformation on the data and pull the normalized hit count matrix. 
vst <- varianceStabilizingTransformation(dds)
mat <- assay(vst)

#If you have a batch designed in, regress the batch effects. 
#mat <- limma::removeBatchEffect(mat, vst$batch)
#assay(vst) <- mat

#Plot the overall dispersion of the experiment to ensure it aligns with expectations. 
plotDispEsts(dds)

Volcano plot: DeSeq2 results

Volcano plots are a great way to visualize the pairwise comparisons. Each shows the log2 fold-change (log2FC) on the x-axis and -log10(padj) on the y-axis. This should usually look akin to a Plinian eruption (hence volcano plot).

Image of the volcano plot can be found in the stats_analysis/DESEQ2/ directory

p <- EnhancedVolcano(dds_result_table,
        lab=as.character(dds_result_table$Rank9),
        title="Pairwise comparisons of taxa (species level) between treatments",
        x='log2FoldChange',
        y='padj',
        legendPosition = 'bottom',
        legendLabels=c('Not sig.','Log (base 2) FC','padj-value','padj-value & Log (base 2) FC'),
        pCutoff=0.05, 
        FCcutoff=1
        )
print(p)

ggsave("volcano_plot.tiff", dpi = 300, plot = p, path = "./DESEQ2/")

Beta Diversity

Measuring species diversity or the level or similarity/dissimilarity of species between samples.

#Pairwise Adonis function
pairwise.adonis.dm <- function(x,factors,stratum=NULL,p.adjust.m="fdr",perm=999){
  
  library(vegan)
  if(class(x) != "dist") stop("x must be a dissimilarity matrix (dist object)")
  co = as.matrix(combn(unique(factors),2))
  pairs = c()
  F.Model =c()
  R2 = c()
  p.value = c()
  
  
  
  for(elem in 1:ncol(co)){
    sub_inds <- factors %in% c(as.character(co[1,elem]),as.character(co[2,elem]))
    resp <- as.matrix(x)[sub_inds,sub_inds]
    ad = adonis2(as.dist(resp) ~
                  
                  factors[sub_inds], strata=stratum[sub_inds], permutations=perm);
    
    pairs = c(pairs,paste(co[1,elem],'vs',co[2,elem]));
    F.Model =c(F.Model,ad$F[1]);
    R2 = c(R2,ad$R2[1]);
    p.value = c(p.value,ad$`Pr(>F)`[1])
    
  }
  
  p.adjusted = p.adjust(p.value,method=p.adjust.m)
  pairw.res = data.frame(pairs,F.Model,R2,p.value,p.adjusted)
  return(pairw.res)
}
# Calculate distance matrices
ps_bc <- phyloseq::distance(ps_object, method = "bray")
ps_euc <- phyloseq::distance(ps_object, method = "euclidean")

PERMANOVAs

PERMANOVA’s with Adonis - Permutational Multivariate Analysis of Variance. This measures dissimilarity in response to one or more factors in an analysis of variance design. Dissimilarity statistics are used to measure distances between data points and test whether they are significantly different.

We run the Adonis test specifying the treatment groups in the model formula.

Bray-curtis

Bray-Curtis dissimilarity examines the abundances of ASVs that are shared between two samples, and the number of ASVs found in each. Bray-Curtis dissimilarity ranges from 0-1. If 0, the two samples share all the same ASVs; if 1, they don’t share any ASVs.

Permanova result

The value under the column labeled ‘Pr(>F)’ in the results table indicates significance (i.e. p-value)

#make a dataframe out of the sample data
sampledf <- data.frame(sample_data(ps_object))

adonis2(ps_bc ~ treatment, data = sampledf) 
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = ps_bc ~ treatment, data = sampledf)
           Df SumOfSqs      R2      F Pr(>F)  
treatment   3    1.026 0.02703 1.7409  0.062 .
Residual  188   36.936 0.97297                
Total     191   37.962 1.00000                
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Euclidean

Measures the distance between two samples in Euclidean space (the length of the line segment between them).

Permanova result

The value under the column labeled ‘Pr(>F)’ in the results table indicates significance (i.e. p-value)

adonis2(ps_euc ~ treatment, data = sampledf) 
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

adonis2(formula = ps_euc ~ treatment, data = sampledf)
           Df   SumOfSqs      R2      F Pr(>F)   
treatment   3 3.0164e+10 0.03385 2.1953  0.008 **
Residual  188 8.6105e+11 0.96615                 
Total     191 8.9121e+11 1.00000                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Pairwise comparisons

A multi-level Adonis test using the FDR multiple test correction.

Bray-curtis pairwise result

pairwise.adonis.dm(ps_bc, sampledf$treatment, p.adjust.m = "fdr")
NA

Euclidean pairwise result

pairwise.adonis.dm(ps_euc, sampledf$treatment, p.adjust.m = "fdr") 
NA

Beta diversiy PCoA and NMDS plots

PCoA and NMDS plots are used to visualize the similarities/dissimilarities of sample points using their calculated distance/dissimilarity matrices (from the Bray-curtis and Euclidean distance measurements). Samples that are closer are more related and vice versa. PCoA plots maximize the linear correlation between samples, wherein NMDS plots maximize the rank-order correlation between samples. Additionally, in case of NMDS, data is not required to fit a normal distribution.

Images of each plot can be found in the stats_analysis/beta_diversity_plots/ directory.

Bray-curtis PCoA

# Bray-curtis
#PCOA BC
ps_bc_pcoa <- ordinate(ps_object, distance = ps_bc, "PCoA")

ps_bc_pcoa_plot <- plot_ordination(ps_object, ps_bc_pcoa, color = "treatment") + 
  geom_point(size = 3) + 
  theme_classic() +
  theme(legend.position = "bottom", legend.title = element_blank()) + 
  scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F")) +
  theme(plot.title = element_text(hjust = 0.5))
ps_bc_pcoa_plot


#save as tiff
ggsave("bray_pcoa_plot.tiff", dpi = 300, plot = ps_bc_pcoa_plot, path = "./beta_diversity_plots/")

Bray-curtis NMDS

#NMDS BC
ps_bc_nmds <- ordinate(ps_object, distance = ps_bc, "NMDS")
ps_bc_nmds_plot <- plot_ordination(ps_object, ps_bc_nmds, color = "treatment") + 
  geom_point(size = 3) + 
  theme_classic() +
  theme(legend.position = "bottom", legend.title = element_blank()) + 
  scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F")) +
  theme(plot.title = element_text(hjust = 0.5))
ps_bc_nmds_plot


#save as tiff
ggsave("bray_nmds_plot.tiff", dpi = 300, plot = ps_bc_nmds_plot, path = "./beta_diversity_plots/")

Euclidean PCoA


# Euclidean
#PCOA EUC
ps_euc_pcoa <- ordinate(ps_object, distance = ps_euc, "PCoA")

ps_euc_pcoa_plot <- plot_ordination(ps_object, ps_euc_pcoa, color = "treatment") + 
  geom_point(size = 3) + 
  theme_classic() +
  theme(legend.position = "bottom", legend.title = element_blank()) + 
  scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F")) +
  theme(plot.title = element_text(hjust = 0.5))
ps_euc_pcoa_plot


#save as tiff
ggsave("euc_pcoa_plot.tiff", dpi = 300, plot = ps_euc_pcoa_plot, path = "./beta_diversity_plots/")

Euclidean NMDS

#NMDS BC
ps_euc_nmds <- ordinate(ps_object, distance = ps_euc, "NMDS")
ps_euc_nmds_plot <- plot_ordination(ps_object, ps_euc_nmds, color = "treatment") + 
  geom_point(size = 3) + 
  theme_classic() +
  theme(legend.position = "bottom", legend.title = element_blank()) + 
  scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F")) +
  theme(plot.title = element_text(hjust = 0.5))
ps_euc_nmds_plot


#save as tiff
ggsave("euc_nmds_plot.tiff", dpi = 300, plot = ps_euc_nmds_plot, path = "./beta_diversity_plots/")

Relative Abundance Taxa Plots

Relative abundance taxa bar plots for treatment

Plots were made at the phylum, genus, and species level, comparing across the 4 treatments.

Images of each plot can be found in the stats_analysis/taxaplots/ directory.

Phylum

# transform to relative abundance
ps_rel <- transform(ps_object, "compositional")

# melt the data into a table
ps_rel_melt<- ps_rel %>%
  psmelt()

# Get phylum with mean relative abundance across all samples 
ps_rel_phylum_sum <- ps_rel_melt%>% group_by(Rank3) %>% dplyr::summarise(Aver = mean(Abundance))
names_ps_rel_phylum <- ps_rel_phylum_sum$Rank3
names_ps_rel_phylum
#Phylum Plot
taxaplot_ps_rel_phylum = ggplot(ps_rel_melt
                    , aes(x = treatment, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank3))  +
  scale_fill_manual(values=c("aquamarine4", "gold","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285", "#652926","grey80",
                             "#5E738F","#D1A33D", "#8A7C64","lightgreen","aquamarine4",
                             "aquamarine2", "lightsalmon", "#CD9BCD")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Treatment") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0))
taxaplot_ps_rel_phylum <- taxaplot_ps_rel_phylum + guides(fill=guide_legend(title="Phylum"))
taxaplot_ps_rel_phylum


ggsave("taxaplot_phylum.tiff", dpi = 300, plot = taxaplot_ps_rel_phylum, path = "./taxaplots/")

Genus

Genera with a relative abundance lower than 0.0001 were grouped together.


#Get genera with mean realtive abundance >0.0001 across all samples 
ps_rel_genus_sum <- ps_rel_melt%>% group_by(Rank7) %>% dplyr::summarise(Aver = mean(Abundance))
ps_rel_genus_sub <- ps_rel_genus_sum[which(ps_rel_genus_sum$Aver > 0.0001),]
names_ps_rel_genus <- ps_rel_genus_sub$Rank7
names_ps_rel_genus
# Replace genera with <0.001 abundance with "NA"
ps_rel_melt$Rank7[ps_rel_melt$Rank7 != "Abiotrophia" &
                           ps_rel_melt$Rank7 != "Aerococcus" &
                           ps_rel_melt$Rank7 != "Dialister" &
                           ps_rel_melt$Rank7 != "Granulicatella" &
                           ps_rel_melt$Rank7 != "Ignavibacterium" &
                           ps_rel_melt$Rank7 != "Lacticaseibacillus" &
                           ps_rel_melt$Rank7 != "Lacticaseibacillus" &
                           ps_rel_melt$Rank7 != "Lactiplantibacillus" &
                           ps_rel_melt$Rank7 != "Lactobacillus" &
                           ps_rel_melt$Rank7 != "Lancefieldella" &
                           ps_rel_melt$Rank7 != "Lentilactobacillus" &
                           ps_rel_melt$Rank7 != "Levilactobacillus" &
                           ps_rel_melt$Rank7 != "Limosilactobacillus" &
                           ps_rel_melt$Rank7 != "Megasphaera" &
                           ps_rel_melt$Rank7 != "Moryella" &
                           ps_rel_melt$Rank7 != "Shuttleworthia" &
                           ps_rel_melt$Rank7 != "Solobacterium" &
                           ps_rel_melt$Rank7 != "Staphylococcus" &
                           ps_rel_melt$Rank7 != "Streptococcus" &
                           ps_rel_melt$Rank7 != "Veillonella"] <- NA

#replace NA with "Rel. Abund.<0.0001"
ps_rel_melt[is.na(ps_rel_melt)]<-"Rel. Abund.<0.0001"
#Genera plot
taxaplot_ps_rel_genus = ggplot(ps_rel_melt
                    , aes(x = treatment, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank7))  +
  scale_fill_manual(values=c("aquamarine4", "gold","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285", "#652926","grey80",
                             "#5E738F","#D1A33D", "#8A7C64","lightgreen","aquamarine4",
                             "aquamarine2", "lightsalmon", "#CD9BCD")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Treatment") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0))
taxaplot_ps_rel_genus <- taxaplot_ps_rel_genus + guides(fill=guide_legend(title="Genus"))
taxaplot_ps_rel_genus

ggsave("taxaplot_genus.tiff", dpi = 300, plot = taxaplot_ps_rel_genus, path = "./taxaplots/")

Species

Species with a relative abundance lower than 0.01 were grouped together.

#Get species with mean realtive abundance >0.01 across all samples 
ps_rel_species_sum <- ps_rel_melt%>% group_by(Rank9) %>% dplyr::summarise(Aver = mean(Abundance))
ps_rel_species_sub <- ps_rel_species_sum[which(ps_rel_species_sum$Aver > 0.01),]
names_ps_rel_species <- ps_rel_species_sub$Rank9
names_ps_rel_species
# Replace genera with <0.001 abundance with "NA"
# Replace genera with <0.001 abundance with "NA"
ps_rel_melt$Rank9[ps_rel_melt$Rank9 != "Lactobacillus crispatus" &
                           ps_rel_melt$Rank9 != "Lactobacillus jensenii" &
                           ps_rel_melt$Rank9 != "Limosilactobacillus fermentum" &
                           ps_rel_melt$Rank9 != "Limosilactobacillus vaginalis" &
                           ps_rel_melt$Rank9 != "Megasphaera micronuciformis" &
                           ps_rel_melt$Rank9 != "Streptococcus anginosus" &
                           ps_rel_melt$Rank9 != "Streptococcus mutans" &
                           ps_rel_melt$Rank9 != "Streptococcus salivarius" &
                           ps_rel_melt$Rank9 != "Streptococcus vestibularis" &
                           ps_rel_melt$Rank9 != "Veillonella atypica" &
                           ps_rel_melt$Rank9 != "Veillonella dispar" &
                           ps_rel_melt$Rank9 != "Veillonella genomosp. P1 oral clone MB5_P17" &
                           ps_rel_melt$Rank9 != "Veillonella sp. oral taxon 158"] <- NA

#replace NA with "Rel. Abund.<0.01"
ps_rel_melt[is.na(ps_rel_melt)]<-"Rel. Abund.<0.01"
#Species plot
taxaplot_ps_rel_species = ggplot(ps_rel_melt
                    , aes(x = treatment, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank9))  +
  scale_fill_manual(values=c("aquamarine4", "gold","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285", "#652926","grey80",
                             "#5E738F", "#D1A33D")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Treatment") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0))
taxaplot_ps_rel_species <- taxaplot_ps_rel_species + guides(fill=guide_legend(title="Species"))
taxaplot_ps_rel_species

ggsave("taxaplot_species.tiff", dpi = 300, plot = taxaplot_ps_rel_species, path = "./taxaplots/")

Relative abundance top 10 taxa bar plots per sample

Plots were made at the phylum, genus, and species level, comparing across all samples.

Images of each plot can be found in the stats_analysis/taxaplots/ directory.

Top 10 Phyla

# transform to relative abundance
ps_rel <- transform(ps_object, "compositional")

#Top 10 PHYLA
#Get phylum level
ps_phylum <- tax_glom(ps_rel, taxrank= "Rank3")
#Get top 10 phylum
phylum_top10 <- names(sort(taxa_sums(ps_phylum), decreasing=TRUE))[1:10]
ps_phylum_top10 <- prune_taxa(phylum_top10, ps_phylum)

# melt the data into a table
ps_phylum_melt<- ps_phylum_top10 %>%
  psmelt()

# Plot top 10 phylum by sample
taxaplot_ps_top10_phylum = ggplot(ps_phylum_melt
                    , aes(x = sample, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank3))  +
  scale_fill_manual(values=c("aquamarine4", "gold","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Sample") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0)) +
  theme(legend.title = element_text(size=9), legend.text = element_text(size=7))
taxaplot_ps_top10_phylum <- taxaplot_ps_top10_phylum + guides(fill=guide_legend(title="Top 10 Phyla"))
taxaplot_ps_top10_phylum


ggsave("taxaplot_top10_phylum.tiff", dpi = 300, width = 18, height = 8, units = "in", plot = taxaplot_ps_top10_phylum, path = "./taxaplots/")

Top 10 Genera


#Top 10 GENERA
#Get genus level
ps_genus <- tax_glom(ps_rel, taxrank= "Rank7")
#Get top 10 genera
genus_top10 <- names(sort(taxa_sums(ps_genus), decreasing=TRUE))[1:10]
ps_genus_top10 <- prune_taxa(genus_top10, ps_genus)

# melt the data into a table
ps_genus_melt<- ps_genus_top10 %>%
  psmelt()
#head(ps_genus_melt)

# Plot top 10 phylum by sample
taxaplot_ps_top10_genus = ggplot(ps_genus_melt
                    , aes(x = sample, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank7))  +
  scale_fill_manual(values=c("aquamarine4", "gold3","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Sample") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0)) +
  theme(legend.title = element_text(size=9), legend.text = element_text(size=7))
taxaplot_ps_top10_genus <- taxaplot_ps_top10_genus + guides(fill=guide_legend(title="Top 10 Genera"))
taxaplot_ps_top10_genus


ggsave("taxaplot_top10_genus.tiff", dpi = 300, width = 18, height = 8, units = "in", plot = taxaplot_ps_top10_genus, path = "./taxaplots/")

Top 10 Species


#Top 10 SPECIES
#Get species level
ps_species <- tax_glom(ps_rel, taxrank= "Rank9")
#Get top 10 genera
species_top10 <- names(sort(taxa_sums(ps_species), decreasing=TRUE))[1:10]
ps_species_top10 <- prune_taxa(species_top10, ps_species)

# melt the data into a table
ps_species_melt<- ps_species_top10 %>%
  psmelt()
#head(ps_species_melt)

# Plot top 10 phylum by sample
taxaplot_ps_top10_species = ggplot(ps_species_melt
                    , aes(x = sample, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank9))  +
  scale_fill_manual(values=c("aquamarine4", "gold3","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Sample") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0)) +
  theme(legend.title = element_text(size=9), legend.text = element_text(size=7))
taxaplot_ps_top10_species <- taxaplot_ps_top10_species + guides(fill=guide_legend(title="Top 10 Species"))
taxaplot_ps_top10_species


ggsave("taxaplot_top10_species.tiff", dpi = 300, width = 18, height = 8, units = "in", plot = taxaplot_ps_top10_species, path = "./taxaplots/")
---
title: "16S ONT Sequence and Statistical Analysis"
output:
  html_notebook:
    code_folding: hide
    toc: true
    toc_float: true
  word_document:
    toc: true
  html_document:
    toc: true
    df_print: paged
  pdf_document:
    toc: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
## ONT 16S Sequencing Files
Raw fast5 files from each Oxford Nanopore (ONT) sequencing batch can be found within the directory named: raw_fast5_files/

| A total of 8 sequencing batches were run. Each batch includes 24 samples from the following treatment and donor ids:

|       1. UT1, UT2, UT3

|       2. UT4, UT5, UT6

|       3. PLA1, PLA2, PLA3

|       4. PLA4, PLA5, PLA6 *NOTE: These samples had to be re-run so they represent two separate sequencing runs

|       5. T1-1, T1-2, T1-3

|       6. T1-4, T1-5, T1-6

|       7. T2-1, T2-2, T2-3

|       8. T2-4, T2-5, T2-6

| So for example, the raw fast5 files of samples within UT1, UT2, UT3 would be in: raw_fast5_files/UT1_UT2_UT3_fast5s/

## ONT 16S Sequence QC Pipeline

| After ONT sequencing the following pipeline was used to demultiplex and quality filter reads:

| 1. Basecall with Dorado V 0.4.1 https://github.com/nanoporetech/dorado

| 2. Demultiplex with Guppy V 6.4.6 (guppy_barcoder: https://community.nanoporetech.com/docs/prepare/library_prep_protocols/Guppy-protocol/v/gpb_2003_v1_revax_14dec2018/barcoding-demultiplexing) 

| 3. Read quality filtering and trimming with Chopper V 0.6.0 https://github.com/wdecoster/chopper 
|    Trimmed fastq files can be found in the directory named: trimmed_fastqs

|       Chopper parameters used:

|           •	Min q score: 10

|           •	Min length: 1,000 bp 

|           •	Head crop: 50

|           •	Max length: 1690

| 4. Multiqc results summary using fastp qc V 0.20.1 
|    Multiqc results for each sequencing batch can be found in the directory named: multiqc

| Full pipeline with code is available at: https://github.com/MessyaszA/ONT_demux 

### Taxonomic Classification
Quality filtered and trimmed reads were then aligned to the HOMD database and biom files were created for import into phyloseq.

o	First, built a Kraken2 database of the 16S HOMD sequences (https://www.homd.org/ftp/16S_rRNA_refseq/HOMD_16S_rRNA_RefSeq/current/HOMD_16S_rRNA_RefSeq_V15.23.fasta) 

| 1. Taxonomically classified reads via Kraken2 to the built HOMD 16S database

| 2. Estimated abundance of taxa via Bracken https://ccb.jhu.edu/software/bracken/ 
|       Krona graphs of the Bracken abundance estimates for each sample can be found in the directory named: krona_bracken

| 3. Exported Bracken results as a biom file for import into phlyoseq (via taxpasta V 0.6.1 https://taxpasta.readthedocs.io/en/latest/) 

| Full pipeline with code for these 3 steps is available at: https://github.com/MessyaszA/bracken_taxpasta 

## Statistical Analysis
Diversity, differential abundance, and relative abundance of taxa were analyzed via Phyloseq and statistical tests were run between the 4 treatments. 

The biom file and metadata table imported into phyloseq can be found in the directory named: stats_analysis

Results include:

| o Alpha diversity metrics - Observed, Shannon, Simpson (csv tables in stats_analysis directory)

| o	Alpha diversity statistical tests (results below)

| o	Alpha diversity boxplots (tiff images in stats_analysis/alpha_diversity_plots directory)

|       •	*NOTE: Observed richness was run on raw and normalized data. Rarefaction to lowest read number was used as normalization.
|       This was included for comparative purposes. Other normalization methods can also be run if requested.

| o	Pairwise differential abundance test via DeSeq2 (csv table results in stats_analysis/DESEQ2 directory)

|       •	Volcano plot (tiff image in stats_analysis/DESEQ2 directory)

|       •	*NOTE: Another differential abundance method for microbiome data (ANCOM-BC
|       https://www.bioconductor.org/packages/release/bioc/html/ANCOMBC.html ) can be run if requested.


| o	Beta diversity statistical tests - Bray-Curtis, Euclidean distances

|       •	*NOTE: Methods for normalizing beta diversity metrics (i.e. calculating Aitchison distance via the deicode package in Qiime2
|       https://library.qiime2.org/plugins/deicode/19/) can be run if requested. 

| o	Beta diversity plots (tiff images in stats_analysis/beta_diversity_plots directory)

| o	Relative abundance taxa plots (tiff images can be found in stats_analysis/taxaplots directory)
```{r Libraries,  message=FALSE, warning=FALSE}

#Libraries we will use
library("DESeq2")
library("ggplot2")
library(Biostrings)
library("phyloseq")
library("microbiome")
library("ggthemes")
library(ggnetwork)
library(cowplot)
library("vegan")
library(magrittr)
library(dplyr)
library(EnhancedVolcano)

#Create a few output directories that will hold the output files. 
dir.create('./DESEQ2')
dir.create('./ANCOM')
```

```{r data_load1, warning=FALSE}
# Load data and add metadata
ps_object <- import_biom("./HOMD_biom_merge.biom")

#import medata and add to phyloseq object
metadata <- read.csv("./metadata.csv", row.names = 1)
metadata = sample_data(data.frame(metadata))
sample_data(ps_object) <- metadata
```
## Statistical Analysis Results

### Total number of taxa
The total number of taxonomically identified ASVs across all samples
```{r data_load2, warning=FALSE}
#Check sample numbers
ntaxa(ps_object) #Total taxa
```
### Number of taxonomically classified reads per sample
Samples will have varying numbers of taxonomically classified reads due to different sampling depths during sequencing. 
```{r data_load3, warning=FALSE}
#Check sample numbers
sample_sums(ps_object) #Reads per sample
```
## Alpha Diversity
Measuring species diversity within each sample.
```{r functions}
#Function to calculate the mean and the standard deviation for each group.
#data : a data frame
#varname : the name of a column containing the variable to be summarized
#groupnames : vector of column names to be used as grouping variables

# Calculate standard error
sderr <- function(x) {sd(x)/sqrt(length(x))}

data_summary <- function(data, varname, groupnames){
  require(plyr)
  summary_func <- function(x, col){
    c(mean = mean(x[[col]], na.rm=TRUE),
      sd = sderr(x[[col]]), na.rm=TRUE)
  }
  data_sum<-ddply(data, groupnames, .fun=summary_func,
                  varname)
  data_sum <- rename(data_sum, c("mean" = varname))
  return(data_sum)
}
```

### Data table with alpha diversity metrics: Observed, Shannon, Simpson
#### Example of data table structure (full table saved as csv)
Three alpha diversity metrics are calculated for each sample.
The full table (per_sample_ad_dataframe.csv) can be found in the directory named: stats_analysis
```{r data_table, warning = FALSE, message=FALSE}
alpha_div <- cbind(estimate_richness(ps_object, 
                                                   measures = c('Observed', 'Shannon', 'Simpson')),
                                 sample_data(ps_object))
colnames(alpha_div) <- c('Observed', 'Shannon', 'Simpson')
alpha_div$Labels <- rownames(alpha_div)
ad_df <- alpha_div[,c('Observed', 'Shannon', 'Simpson')]
ad_df <- cbind(ad_df,
                              sample_data(ps_object)[, c('sample', 'group', 'donor', 'treatment')])
colnames(ad_df) <- c('Observed', 'Shannon', 'Simpson', 'sample', 'group', 'donor', 'treatment')
head(ad_df)
write.csv(ad_df, "./per_sample_ad_dataframe.csv")
```
### Data Summary statistics: Observed, Shannon, Simpson
#### Summary for each treatment
The average of each alpha diversity metric for all samples within a treatment. 
```{r summary_stats, warning = FALSE, message=FALSE}
# Observed
data_summary(ad_df, varname = "Observed", groupnames = c("treatment"))
#data_summary(ad_df, varname = "Observed", groupnames = c("sample"))
#data_summary(ad_df, varname = "Observed", groupnames = c("donor"))

# Shannon
data_summary(ad_df, varname = "Shannon", groupnames = c("treatment"))
#data_summary(ad_df, varname = "Shannon", groupnames = c("sample"))
#data_summary(ad_df, varname = "Shannon", groupnames = c("donor"))

# Simpson
data_summary(ad_df, varname = "Simpson", groupnames = c("treatment"))
#data_summary(ad_df, varname = "Simpson", groupnames = c("sample"))
#data_summary(ad_df, varname = "Simpson", groupnames = c("donor"))
```
### Alpha diversity statistical tests for significance
#### Kruskal Wallis and Pairwise Wilcoxon tests
Kruskal-Wallis Rank Sum Test for a category (treatment) on each Alpha Diversity metric calculated. This is a non-parametric method which ranks the data and tests whether samples from each group analyzed come from the same distribution. 

Pairwise Wilcoxon Rank Sum Test between groups within a category (treatment) on each Alpha Diversity metric calculated and corrected for multiple testing. This methods compares two independent samples for all group levels and includes the false discovery rate (FDR) multiple test correction. The results below come up as a matrix filled with the p-values for each treatment comparison.

#### Observed Richness 
Testing ASV/species richness between the 4 treatments.
```{r observed_ad, warning = FALSE, message=FALSE}
#### Observed
#Kruskal Wallis tests
kruskal.test(Observed ~ treatment, data = ad_df)
#kruskal.test(Observed ~ sample, data = ad_df)
#kruskal.test(Observed ~ donor, data = ad_df)

#Pairwise Wilcoxon Test
pairwise.wilcox.test(ad_df$Observed, ad_df$treatment, p.adjust.method = 'fdr')
```
#### Shannon Diversity
Testing ASV/species richness between the 4 treatments while also taking into account relative abundance (evenness).
```{r shannon_ad, warning = FALSE, message=FALSE}
#### Shannon
#Kruskal Wallis tests
kruskal.test(Shannon ~ treatment, data = ad_df)
#kruskal.test(Shannon ~ sample, data = ad_df)
#kruskal.test(Shannon ~ donor, data = ad_df)

#Pairwise Wilcoxon Test
pairwise.wilcox.test(ad_df$Shannon, ad_df$treatment, p.adjust.method = 'fdr')
```
#### Simpson Diversity
Testing ASV/species richness between the 4 treatments while also taking into account relative abundance (evenness). Simpson diversity gives more weight to common or dominant ASVs (rare ASVs will not affect diversity).
```{r simpson_ad, warning = FALSE, message=FALSE}
#### Simpson
#Kruskal Wallis tests
kruskal.test(Simpson ~ treatment, data = ad_df)
#kruskal.test(Simpson ~ sample, data = ad_df)
#kruskal.test(Simpson ~ donor, data = ad_df)

#Pairwise Wilcoxon Test
pairwise.wilcox.test(ad_df$Simpson, ad_df$treatment, p.adjust.method = 'fdr')
```

### Alpha diversity boxplots
Images of each plot can be found in: stats_analysis/alpha_diversity_plots
```{r AD, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=6, fig.width=7}
###Group plots
#Observed
obs_ad_plot <- ggplot(ad_df, aes(x=treatment, y=Observed, color = treatment)) + geom_boxplot() + geom_point() + ylab('Observed Richness') + scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F"))

obs_ad_plot

ggsave("obs_ad_plot.tiff", dpi = 300, plot = obs_ad_plot, path = "./alpha_diversity_plots/")

#Shannon
shan_ad_plot <- ggplot(ad_df, aes(x=treatment, y=Shannon, color = treatment)) + geom_boxplot() + geom_point() + ylab('Shannon Diversity') + scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F"))

shan_ad_plot

ggsave("shan_ad_plot.tiff", dpi = 300, plot = shan_ad_plot, path = "./alpha_diversity_plots/")

#Simpson
simp_ad_plot <- ggplot(ad_df, aes(x=treatment, y=Simpson, color = treatment)) + geom_boxplot() + geom_point() + ylab('Simpson Diversity') + scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F"))

simp_ad_plot

ggsave("simp_ad_plot.tiff", dpi = 300, plot = simp_ad_plot, path = "./alpha_diversity_plots/")
```
## Alpha Diversity on Rarefied Data
### Normalize Observed Richness metric by rarefying to lowest sample sum number
Shannon and Simpson diversity indices are already normalized by relative abundance. 

### Rarefaction read depth
One method of normalization is to rarefy each sample to the minimum read depth found across all samples. The minimum read depth for all samples is: 
```{r rare_num, warning=FALSE}
min(sample_sums(ps_object)) #Minimum reads (used for rarefying)
```
#### Example of datatable structure for rarefied data observed richness (full table saved as csv)
Observed richness is calculated for each sample now rarefied to the minimum read depth.
The full table (rarefied_per_sample_ad_dataframe.csv) can be found in the directory named: stats_analysis
```{r AD_rarefied, warning=FALSE, message=FALSE}
#Rarefy to lowest sample_sums
rare_num <- min(sample_sums(ps_object))
ps_rare <- rarefy_even_depth(ps_object, sample.size = rare_num, rngseed = 999)

#Get datatable with Observed richness for rarefied samples
alpha_div_rare <- cbind(estimate_richness(ps_rare, 
                                                   measures = c('Observed')),
                                 sample_data(ps_rare))
colnames(alpha_div_rare) <- c('Observed_rare')
alpha_div_rare$Labels <- rownames(alpha_div_rare)
ad_df_rare <- alpha_div_rare[,c('Observed_rare')]
ad_df_rare <- cbind(ad_df_rare,
                              sample_data(ps_rare)[, c('sample', 'group', 'donor', 'treatment')])
colnames(ad_df_rare) <- c('Observed_rare', 'sample', 'group', 'donor', 'treatment')
head(ad_df_rare)
write.csv(ad_df_rare, "./rarefied_per_sample_ad_dataframe.csv")
```
#### Data summary for rarefied data (observed richness)
The average observed richness for all samples within a treatment. 
```{r ad_rare_summary, warning=FALSE, message=FALSE}
# Observed Rarefied data summary stats
data_summary(ad_df_rare, varname = "Observed_rare", groupnames = c("treatment"))
#data_summary(ad_df_rare, varname = "Observed_rare", groupnames = c("sample"))
#data_summary(ad_df_rare, varname = "Observed_rare", groupnames = c("donor"))
```

#### Observed richness statistical tests for rarefied data
Testing ASV/species richness between the 4 treatments.
```{r ad_rare_stats, warning=FALSE, message=FALSE}

#### Observed Rarefied
#Kruskal Wallis tests
kruskal.test(Observed_rare ~ treatment, data = ad_df_rare)
#kruskal.test(Observed_rare ~ sample, data = ad_df_rare)
#kruskal.test(Observed_rare ~ donor, data = ad_df_rare)

#Pairwise Wilcoxon Test
pairwise.wilcox.test(ad_df_rare$Observed_rare, ad_df_rare$treatment, p.adjust.method = 'fdr')
```
#### Rarefied data observed richness boxplot
Images of the plot can be found in: stats_analysis/alpha_diversity_plots
```{r ad_rare_plot, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=6, fig.width=7}
#Plots: Observed Rarefied
obs_ad_plot_rare <- ggplot(ad_df_rare, aes(x=treatment, y=Observed_rare, color = treatment)) + geom_boxplot() + geom_point() + ylab('Observed Richness Rarefied Data') + scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F"))

obs_ad_plot_rare

ggsave("rarefied_obs_ad_plot.tiff", dpi = 300, plot = obs_ad_plot_rare, path = "./alpha_diversity_plots/")
```

## Differential Abundance Analysis
### Pairwise differential abundance test between treatments via DESeq2
The package DESeq2 provides methods to test for differential abundance by use of negative binomial generalized linear models. DESeq2 uses the Wald test to take into account the dispersion model distance and negative binomial distribution we see in sequencing data. Overall this method statistically infers systematic changes between conditions, as compared to within-condition variability. It estimates dispersion and logarithmic fold changes to test which sequences/ASVs are significantly changing within the experimental design. 

Additionally, a p-value correction is applied for multiple hypothesis testing. Traditionally p < 0.05 is considered significant (95% chance it is not by random chance), but when you are looking at thousands of ASVs/sequences, you'll still end up with 5% of them significant off predicted chance. One of the best corrections to use called False-Discovery Rate correction (FDR), which is the Benjamini and Hochberg method.

Here we ran DESeq2 specifying only treatment in the experimental design. Log fold changes and adjusted p-valus (padj) for all ASVs and significant ASVs have been saved as csv tables in the stats_analysis/DESEQ2 directory (named dds_result_table.csv and sig_result_table.csv respectively).

The below plot shows the fitted curve (red line) used to calculate the final dispersion estimates (blue dots) of the input data (black dots). 
```{r Deseq, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=8, fig.width=9}
#Phyloseq to Deseq2
ps_deseq = phyloseq_to_deseq2(ps_object, ~treatment)

#Run Deseq2
dds = DESeq(ps_deseq, test="Wald", fitType="local")
dds_result <- results(dds)
dds_result_table <- cbind(as(dds_result, "data.frame"), as(tax_table(ps_object)[rownames(dds_result), ], "matrix"))
write.csv(dds_result_table, "./DESEQ2/dds_result_table.csv")

#check significant results
dds_sig = dds_result[which(dds_result$padj < 0.05), ]
dds_sig_table = cbind(as(dds_sig, "data.frame"), as(tax_table(ps_object)[rownames(dds_sig), ], "matrix"))
dim(dds_sig_table)
write.csv(dds_sig_table, "./DESEQ2/sig_result_table.csv")

#Perform the variance stabilizing transformation on the data and pull the normalized hit count matrix. 
vst <- varianceStabilizingTransformation(dds)
mat <- assay(vst)

#If you have a batch designed in, regress the batch effects. 
#mat <- limma::removeBatchEffect(mat, vst$batch)
#assay(vst) <- mat

#Plot the overall dispersion of the experiment to ensure it aligns with expectations. 
plotDispEsts(dds)
```

### Volcano plot: DeSeq2 results
Volcano plots are a great way to visualize the pairwise comparisons. Each shows the log2 fold-change (log2FC) on the x-axis and -log10(padj) on the y-axis. This should usually look akin to a Plinian eruption (hence volcano plot).

Image of the volcano plot can be found in the stats_analysis/DESEQ2/ directory
```{r Volcanoes, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=8, fig.width=9}
p <- EnhancedVolcano(dds_result_table,
        lab=as.character(dds_result_table$Rank9),
        title="Pairwise comparisons of taxa (species level) between treatments",
        x='log2FoldChange',
        y='padj',
        legendPosition = 'bottom',
        legendLabels=c('Not sig.','Log (base 2) FC','padj-value','padj-value & Log (base 2) FC'),
        pCutoff=0.05, 
        FCcutoff=1
        )
print(p)
ggsave("volcano_plot.tiff", dpi = 300, plot = p, path = "./DESEQ2/")
```
## Beta Diversity
Measuring species diversity or the level or similarity/dissimilarity of species between samples.
```{r}
#Pairwise Adonis function
pairwise.adonis.dm <- function(x,factors,stratum=NULL,p.adjust.m="fdr",perm=999){
  
  library(vegan)
  if(class(x) != "dist") stop("x must be a dissimilarity matrix (dist object)")
  co = as.matrix(combn(unique(factors),2))
  pairs = c()
  F.Model =c()
  R2 = c()
  p.value = c()
  
  
  
  for(elem in 1:ncol(co)){
    sub_inds <- factors %in% c(as.character(co[1,elem]),as.character(co[2,elem]))
    resp <- as.matrix(x)[sub_inds,sub_inds]
    ad = adonis2(as.dist(resp) ~
                  
                  factors[sub_inds], strata=stratum[sub_inds], permutations=perm);
    
    pairs = c(pairs,paste(co[1,elem],'vs',co[2,elem]));
    F.Model =c(F.Model,ad$F[1]);
    R2 = c(R2,ad$R2[1]);
    p.value = c(p.value,ad$`Pr(>F)`[1])
    
  }
  
  p.adjusted = p.adjust(p.value,method=p.adjust.m)
  pairw.res = data.frame(pairs,F.Model,R2,p.value,p.adjusted)
  return(pairw.res)
}
```

```{r}
# Calculate distance matrices
ps_bc <- phyloseq::distance(ps_object, method = "bray")
ps_euc <- phyloseq::distance(ps_object, method = "euclidean")
```
### PERMANOVAs
PERMANOVA's with Adonis -  Permutational Multivariate Analysis of Variance. This measures dissimilarity in response to one or more factors in an analysis of variance design. Dissimilarity statistics are used to measure distances between data points and test whether they are significantly different.

We run the Adonis test specifying the treatment groups in the model formula. 

#### Bray-curtis
Bray-Curtis dissimilarity examines the abundances of ASVs that are shared between two samples, and the number of ASVs found in each. Bray-Curtis dissimilarity ranges from 0-1. If 0, the two samples share all the same ASVs; if 1, they don’t share any ASVs.

#### Permanova result
The value under the column labeled 'Pr(>F)' in the results table indicates significance (i.e. p-value)
```{r}
#make a dataframe out of the sample data
sampledf <- data.frame(sample_data(ps_object))

adonis2(ps_bc ~ treatment, data = sampledf) 
```
#### Euclidean
Measures the distance between two samples in Euclidean space (the length of the line segment between them).

#### Permanova result
The value under the column labeled 'Pr(>F)' in the results table indicates significance (i.e. p-value)
```{r}
adonis2(ps_euc ~ treatment, data = sampledf) 
```

### Pairwise comparisons
A multi-level Adonis test using the FDR multiple test correction. 

#### Bray-curtis pairwise result
```{r}
pairwise.adonis.dm(ps_bc, sampledf$treatment, p.adjust.m = "fdr")

```
#### Euclidean pairwise result
```{r}
pairwise.adonis.dm(ps_euc, sampledf$treatment, p.adjust.m = "fdr") 

```
### Beta diversiy PCoA and NMDS plots
PCoA and NMDS plots are used to visualize the similarities/dissimilarities of sample points using their calculated distance/dissimilarity matrices (from the Bray-curtis and Euclidean distance measurements). Samples that are closer are more related and vice versa. 
PCoA plots maximize the linear correlation between samples, wherein NMDS plots maximize the rank-order correlation between samples. Additionally, in case of NMDS, data is not required to fit a normal distribution.

Images of each plot can be found in the stats_analysis/beta_diversity_plots/ directory. 

#### Bray-curtis PCoA
```{r BD_bray, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=7, fig.width=8}
# Bray-curtis
#PCOA BC
ps_bc_pcoa <- ordinate(ps_object, distance = ps_bc, "PCoA")

ps_bc_pcoa_plot <- plot_ordination(ps_object, ps_bc_pcoa, color = "treatment") + 
  geom_point(size = 3) + 
  theme_classic() +
  theme(legend.position = "bottom", legend.title = element_blank()) + 
  scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F")) +
  theme(plot.title = element_text(hjust = 0.5))
ps_bc_pcoa_plot

#save as tiff
ggsave("bray_pcoa_plot.tiff", dpi = 300, plot = ps_bc_pcoa_plot, path = "./beta_diversity_plots/")
```
#### Bray-curtis NMDS
```{r BD_bray2, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=7, fig.width=8}
#NMDS BC
ps_bc_nmds <- ordinate(ps_object, distance = ps_bc, "NMDS")

ps_bc_nmds_plot <- plot_ordination(ps_object, ps_bc_nmds, color = "treatment") + 
  geom_point(size = 3) + 
  theme_classic() +
  theme(legend.position = "bottom", legend.title = element_blank()) + 
  scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F")) +
  theme(plot.title = element_text(hjust = 0.5))
ps_bc_nmds_plot

#save as tiff
ggsave("bray_nmds_plot.tiff", dpi = 300, plot = ps_bc_nmds_plot, path = "./beta_diversity_plots/")
```
#### Euclidean PCoA
```{r BD_euc, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=7, fig.width=8}

# Euclidean
#PCOA EUC
ps_euc_pcoa <- ordinate(ps_object, distance = ps_euc, "PCoA")

ps_euc_pcoa_plot <- plot_ordination(ps_object, ps_euc_pcoa, color = "treatment") + 
  geom_point(size = 3) + 
  theme_classic() +
  theme(legend.position = "bottom", legend.title = element_blank()) + 
  scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F")) +
  theme(plot.title = element_text(hjust = 0.5))
ps_euc_pcoa_plot

#save as tiff
ggsave("euc_pcoa_plot.tiff", dpi = 300, plot = ps_euc_pcoa_plot, path = "./beta_diversity_plots/")
```
#### Euclidean NMDS
```{r BD_euc2, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=7, fig.width=8}
#NMDS BC
ps_euc_nmds <- ordinate(ps_object, distance = ps_euc, "NMDS")

ps_euc_nmds_plot <- plot_ordination(ps_object, ps_euc_nmds, color = "treatment") + 
  geom_point(size = 3) + 
  theme_classic() +
  theme(legend.position = "bottom", legend.title = element_blank()) + 
  scale_colour_manual(values = c("#954535", "#702963", "#F28C28", "#088F8F")) +
  theme(plot.title = element_text(hjust = 0.5))
ps_euc_nmds_plot

#save as tiff
ggsave("euc_nmds_plot.tiff", dpi = 300, plot = ps_euc_nmds_plot, path = "./beta_diversity_plots/")
```

## Relative Abundance Taxa Plots
### Relative abundance taxa bar plots for treatment
Plots were made at the phylum, genus, and species level, comparing across the 4 treatments.  

Images of each plot can be found in the stats_analysis/taxaplots/ directory. 

#### Phylum
```{r Taxaplots_phy, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=8, fig.width=7}
# transform to relative abundance
ps_rel <- transform(ps_object, "compositional")

# melt the data into a table
ps_rel_melt<- ps_rel %>%
  psmelt()

# Get phylum with mean relative abundance across all samples 
ps_rel_phylum_sum <- ps_rel_melt%>% group_by(Rank3) %>% dplyr::summarise(Aver = mean(Abundance))
names_ps_rel_phylum <- ps_rel_phylum_sum$Rank3
names_ps_rel_phylum

#Phylum Plot
taxaplot_ps_rel_phylum = ggplot(ps_rel_melt
                    , aes(x = treatment, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank3))  +
  scale_fill_manual(values=c("aquamarine4", "gold","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285", "#652926","grey80",
                             "#5E738F","#D1A33D", "#8A7C64","lightgreen","aquamarine4",
                             "aquamarine2", "lightsalmon", "#CD9BCD")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Treatment") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0))
taxaplot_ps_rel_phylum <- taxaplot_ps_rel_phylum + guides(fill=guide_legend(title="Phylum"))
taxaplot_ps_rel_phylum

ggsave("taxaplot_phylum.tiff", dpi = 300, plot = taxaplot_ps_rel_phylum, path = "./taxaplots/")
```
#### Genus
Genera with a relative abundance lower than 0.0001 were grouped together.
```{r Taxaplots_gen, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=8, fig.width=7}

#Get genera with mean realtive abundance >0.0001 across all samples 
ps_rel_genus_sum <- ps_rel_melt%>% group_by(Rank7) %>% dplyr::summarise(Aver = mean(Abundance))
ps_rel_genus_sub <- ps_rel_genus_sum[which(ps_rel_genus_sum$Aver > 0.0001),]
names_ps_rel_genus <- ps_rel_genus_sub$Rank7
names_ps_rel_genus

# Replace genera with <0.001 abundance with "NA"
ps_rel_melt$Rank7[ps_rel_melt$Rank7 != "Abiotrophia" &
                           ps_rel_melt$Rank7 != "Aerococcus" &
                           ps_rel_melt$Rank7 != "Dialister" &
                           ps_rel_melt$Rank7 != "Granulicatella" &
                           ps_rel_melt$Rank7 != "Ignavibacterium" &
                           ps_rel_melt$Rank7 != "Lacticaseibacillus" &
                           ps_rel_melt$Rank7 != "Lacticaseibacillus" &
                           ps_rel_melt$Rank7 != "Lactiplantibacillus" &
                           ps_rel_melt$Rank7 != "Lactobacillus" &
                           ps_rel_melt$Rank7 != "Lancefieldella" &
                           ps_rel_melt$Rank7 != "Lentilactobacillus" &
                           ps_rel_melt$Rank7 != "Levilactobacillus" &
                           ps_rel_melt$Rank7 != "Limosilactobacillus" &
                           ps_rel_melt$Rank7 != "Megasphaera" &
                           ps_rel_melt$Rank7 != "Moryella" &
                           ps_rel_melt$Rank7 != "Shuttleworthia" &
                           ps_rel_melt$Rank7 != "Solobacterium" &
                           ps_rel_melt$Rank7 != "Staphylococcus" &
                           ps_rel_melt$Rank7 != "Streptococcus" &
                           ps_rel_melt$Rank7 != "Veillonella"] <- NA

#replace NA with "Rel. Abund.<0.0001"
ps_rel_melt[is.na(ps_rel_melt)]<-"Rel. Abund.<0.0001"
#Genera plot
taxaplot_ps_rel_genus = ggplot(ps_rel_melt
                    , aes(x = treatment, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank7))  +
  scale_fill_manual(values=c("aquamarine4", "gold","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285", "#652926","grey80",
                             "#5E738F","#D1A33D", "#8A7C64","lightgreen","aquamarine4",
                             "aquamarine2", "lightsalmon", "#CD9BCD")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Treatment") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0))
taxaplot_ps_rel_genus <- taxaplot_ps_rel_genus + guides(fill=guide_legend(title="Genus"))
taxaplot_ps_rel_genus
ggsave("taxaplot_genus.tiff", dpi = 300, plot = taxaplot_ps_rel_genus, path = "./taxaplots/")
```
#### Species
Species with a relative abundance lower than 0.01 were grouped together.
```{r Taxaplots_sp, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=8, fig.width=7}
#Get species with mean realtive abundance >0.01 across all samples 
ps_rel_species_sum <- ps_rel_melt%>% group_by(Rank9) %>% dplyr::summarise(Aver = mean(Abundance))
ps_rel_species_sub <- ps_rel_species_sum[which(ps_rel_species_sum$Aver > 0.01),]
names_ps_rel_species <- ps_rel_species_sub$Rank9
names_ps_rel_species

# Replace genera with <0.001 abundance with "NA"
# Replace genera with <0.001 abundance with "NA"
ps_rel_melt$Rank9[ps_rel_melt$Rank9 != "Lactobacillus crispatus" &
                           ps_rel_melt$Rank9 != "Lactobacillus jensenii" &
                           ps_rel_melt$Rank9 != "Limosilactobacillus fermentum" &
                           ps_rel_melt$Rank9 != "Limosilactobacillus vaginalis" &
                           ps_rel_melt$Rank9 != "Megasphaera micronuciformis" &
                           ps_rel_melt$Rank9 != "Streptococcus anginosus" &
                           ps_rel_melt$Rank9 != "Streptococcus mutans" &
                           ps_rel_melt$Rank9 != "Streptococcus salivarius" &
                           ps_rel_melt$Rank9 != "Streptococcus vestibularis" &
                           ps_rel_melt$Rank9 != "Veillonella atypica" &
                           ps_rel_melt$Rank9 != "Veillonella dispar" &
                           ps_rel_melt$Rank9 != "Veillonella genomosp. P1 oral clone MB5_P17" &
                           ps_rel_melt$Rank9 != "Veillonella sp. oral taxon 158"] <- NA

#replace NA with "Rel. Abund.<0.01"
ps_rel_melt[is.na(ps_rel_melt)]<-"Rel. Abund.<0.01"
#Species plot
taxaplot_ps_rel_species = ggplot(ps_rel_melt
                    , aes(x = treatment, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank9))  +
  scale_fill_manual(values=c("aquamarine4", "gold","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285", "#652926","grey80",
                             "#5E738F", "#D1A33D")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Treatment") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0))
taxaplot_ps_rel_species <- taxaplot_ps_rel_species + guides(fill=guide_legend(title="Species"))
taxaplot_ps_rel_species
ggsave("taxaplot_species.tiff", dpi = 300, plot = taxaplot_ps_rel_species, path = "./taxaplots/")

```

### Relative abundance top 10 taxa bar plots per sample
Plots were made at the phylum, genus, and species level, comparing across all samples.  

Images of each plot can be found in the stats_analysis/taxaplots/ directory. 

#### Top 10 Phyla
```{r Taxaplots2, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=8, fig.width=18} 
# transform to relative abundance
ps_rel <- transform(ps_object, "compositional")

#Top 10 PHYLA
#Get phylum level
ps_phylum <- tax_glom(ps_rel, taxrank= "Rank3")
#Get top 10 phylum
phylum_top10 <- names(sort(taxa_sums(ps_phylum), decreasing=TRUE))[1:10]
ps_phylum_top10 <- prune_taxa(phylum_top10, ps_phylum)

# melt the data into a table
ps_phylum_melt<- ps_phylum_top10 %>%
  psmelt()

# Plot top 10 phylum by sample
taxaplot_ps_top10_phylum = ggplot(ps_phylum_melt
                    , aes(x = sample, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank3))  +
  scale_fill_manual(values=c("aquamarine4", "gold","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Sample") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0)) +
  theme(legend.title = element_text(size=9), legend.text = element_text(size=7))
taxaplot_ps_top10_phylum <- taxaplot_ps_top10_phylum + guides(fill=guide_legend(title="Top 10 Phyla"))
taxaplot_ps_top10_phylum

ggsave("taxaplot_top10_phylum.tiff", dpi = 300, width = 18, height = 8, units = "in", plot = taxaplot_ps_top10_phylum, path = "./taxaplots/")
```
#### Top 10 Genera
```{r Taxaplots_top_gen, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=8, fig.width=18}

#Top 10 GENERA
#Get genus level
ps_genus <- tax_glom(ps_rel, taxrank= "Rank7")
#Get top 10 genera
genus_top10 <- names(sort(taxa_sums(ps_genus), decreasing=TRUE))[1:10]
ps_genus_top10 <- prune_taxa(genus_top10, ps_genus)

# melt the data into a table
ps_genus_melt<- ps_genus_top10 %>%
  psmelt()
#head(ps_genus_melt)

# Plot top 10 phylum by sample
taxaplot_ps_top10_genus = ggplot(ps_genus_melt
                    , aes(x = sample, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank7))  +
  scale_fill_manual(values=c("aquamarine4", "gold3","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Sample") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0)) +
  theme(legend.title = element_text(size=9), legend.text = element_text(size=7))
taxaplot_ps_top10_genus <- taxaplot_ps_top10_genus + guides(fill=guide_legend(title="Top 10 Genera"))
taxaplot_ps_top10_genus

ggsave("taxaplot_top10_genus.tiff", dpi = 300, width = 18, height = 8, units = "in", plot = taxaplot_ps_top10_genus, path = "./taxaplots/")
```
#### Top 10 Species
```{r Taxaplots_top_sp, warning=FALSE, message=FALSE, fig.align="center", results='hide', fig.height=8, fig.width=18}

#Top 10 SPECIES
#Get species level
ps_species <- tax_glom(ps_rel, taxrank= "Rank9")
#Get top 10 genera
species_top10 <- names(sort(taxa_sums(ps_species), decreasing=TRUE))[1:10]
ps_species_top10 <- prune_taxa(species_top10, ps_species)

# melt the data into a table
ps_species_melt<- ps_species_top10 %>%
  psmelt()
#head(ps_species_melt)

# Plot top 10 phylum by sample
taxaplot_ps_top10_species = ggplot(ps_species_melt
                    , aes(x = sample, y=Abundance)) + 
  geom_bar(stat="identity", position="fill", aes(fill = Rank9))  +
  scale_fill_manual(values=c("aquamarine4", "gold3","lightpink", "firebrick","#DA5724","ivory4",
                             "orchid", "#CBD588", "#8569D5", "#D14285")) +
  theme_bw() +
  ylab("Relative Abundance") + xlab("Sample") + theme(axis.text.x = element_text(angle = 90, vjust=0.5, hjust=0)) +
  theme(legend.title = element_text(size=9), legend.text = element_text(size=7))
taxaplot_ps_top10_species <- taxaplot_ps_top10_species + guides(fill=guide_legend(title="Top 10 Species"))
taxaplot_ps_top10_species

ggsave("taxaplot_top10_species.tiff", dpi = 300, width = 18, height = 8, units = "in", plot = taxaplot_ps_top10_species, path = "./taxaplots/")

```
